Goto

Collaborating Authors

 misspecified simulator


Offline Imitation Learning with a Misspecified Simulator

Neural Information Processing Systems

In real-world decision-making tasks, learning an optimal policy without a trial-and-error process is an appealing challenge. When expert demonstrations are available, imitation learning that mimics expert actions can learn a good policy efficiently. Learning in simulators is another commonly adopted approach to avoid real-world trials-and-errors. However, neither sufficient expert demonstrations nor high-fidelity simulators are easy to obtain. In this work, we investigate policy learning in the condition of a few expert demonstrations and a simulator with misspecified dynamics. Under a mild assumption that local states shall still be partially aligned under a dynamics mismatch, we propose imitation learning with horizon-adaptive inverse dynamics (HIDIL) that matches the simulator states with expert states in a $H$-step horizon and accurately recovers actions based on inverse dynamics policies. In the real environment, HIDIL can effectively derive adapted actions from the matched states. Experiments are conducted in four MuJoCo locomotion environments with modified friction, gravity, and density configurations. Experiment results show that HIDIL achieves significant improvement in terms of performance and stability in all of the real environments, compared with imitation learning methods and transferring methods in reinforcement learning.


Review for NeurIPS paper: Offline Imitation Learning with a Misspecified Simulator

Neural Information Processing Systems

Summary and Contributions: The authors are proposing an improvement on existing approaches for imitation learning of policies for embodied agents. The approach is a hybrid between sim-to-real RL approaches (which require a simulator closely matching the real world) and real world imitation learning approaches such as GAIL. The general idea of the paper is that there is a simulator, which, however is allowed to have a different dynamics than the "real world". In particular, the assumption is that two policies can reach the same goal state from the same starting point within H steps in the real-world. The algorithm is tested on the OpenAI Gym environment, where both the real world and the simulator environment are simulations (with different parametrization).


Review for NeurIPS paper: Offline Imitation Learning with a Misspecified Simulator

Neural Information Processing Systems

This paper had a wide spread of reviews and generated significant discussion amongst the reviewers. In the end, the majority of the reviewers agreed that while the main contribution was slightly lacking in novelty (in the sense that it was mostly a retargeting of a known technique to a new setting), it was still a valuable contribution. However, there was not a total consensus, due to R1 having significant concerns about how the paper was written. That said, the majority of reviewers think the paper is strong enough to be accepted, so I recommend that it is, with the caveat that the authors pay close attention to the revision suggestion of R1 to improve the communication of ideas.


Offline Imitation Learning with a Misspecified Simulator

Neural Information Processing Systems

In real-world decision-making tasks, learning an optimal policy without a trial-and-error process is an appealing challenge. When expert demonstrations are available, imitation learning that mimics expert actions can learn a good policy efficiently. Learning in simulators is another commonly adopted approach to avoid real-world trials-and-errors. However, neither sufficient expert demonstrations nor high-fidelity simulators are easy to obtain. In this work, we investigate policy learning in the condition of a few expert demonstrations and a simulator with misspecified dynamics.